Part 0: Setup¶

20 Inference Steps¶

15 Inference Steps¶

25 Inference Steps¶

The seed I used was 1998. I set the inference steps to 15 for the top row and 25 for the bottom row. One key thing I noticed with the captions and their outputs is the changing features with each generation. In each image, the overall quality was more blurry at the lower inference steps.

For example "an oil painting of a snowy mountain village," it is less look detailed with 15 inference steps while it has tons of color and designs at 25 inference steps. For "a man weaing a hat," it is more of a blurry, old timey photo at 15 inference steps while it is more of a model photo at 25 steps. For "a rocket ship," it has less details at 15 inference steps while it has more details and definition at 25 inference steps.

Part 1: Understanding the Forward & Reverse Processes¶

1.1 Forward Process (Adding Noise)¶

1.2 Traditional Denoising (Gaussian Blur)¶

1.3 One-Step Denoising (Using a Pretrained UNet)¶

1.4 Iterative Denoising¶

Part 2: Understanding the Forward & Reverse Processes¶

2.1 Diffusion Model Sampling¶

2.2 Classifier Free Guidance¶

2.3 Image-to-image Translation¶

Part 3: Visual Anagrams (Graduate Students)¶